Fast Context-Free Parsing Requires Fast Boolean Matrix Multiplication
ثبت نشده
چکیده
Valiant showed that Boolean matrix multiplication (BMM) can be used for CFG parsing. We prove a dual result: CFG parsers running in time O([Gl[w[ 3-e) on a grammar G and a string w can be used to multiply m x m Boolean matrices in time O(m3-e/3). In the process we also provide a formal definition of parsing motivated by an informal notion due to Lang. Our result establishes one of the first limitations on general CFG parsing: a fast, practical CFG parser would yield a fast, practical BMM algorithm, which is not believed to exist. 1 I n t r o d u c t i o n The context-free grammar (CFG) formalism was developed during the birth of the field of computational linguistics. The standard methods for CFG parsing are the CKY algorithm (Kasami, 1965; Younger, 1967) and Earley's algorithm (Earley, 1970), both of which have a worst-case running time of O(gN 3) for a CFG (in Chomsky normal form) of size g and a string of length N. Graham et al. (1980) give a variant of Earley's algorithm which runs in time O(gN3/log N). Valiant's parsing method is the asymptotically fastest known (Valiant, 1975). It uses Boolean matrix multiplication (BMM) to speed up the dynamic programming in the CKY algorithm: its worst-case running time is O(gM(N)), where M(rn) is the time it takes to multiply two m x m Boolean matrices together. The standard method for multiplying matrices takes time O(m3). There exist matrix multiplication algorithms with time complexity O(m3-J); for instance, Strassen's has a worstcase running time of O(m 2"sl) (Strassen, 1969), and the fastest currently known has a worst-case running time of O(m 2"376) (Coppersmith and Winograd, 1990). Unfortunately, the constants involved are so large that these fast algorithms (with the possible exception of Strassen's) cannot be used in practice. As matrix multiplication is a very well-studied problem (see Strassen's historical account (Strassen, 1990, section 10)), it is highly unlikely that simple, practical fast matrix multiplication algorithms exist. Since the best BMM algorithms all rely on general matrix multiplication 1, it is widely believed that there are no practical O(m 3-~) BMM algorithms. One might therefore hope to find a way to speed up CFG parsing without relying on matrix multiplication. However, we show in this paper that fast CFG parsing requires fast Boolean matrix multiplication in a precise sense: any parser running in time O(gN 3-e) that represents parse data in a retrieval-efficient way can be converted with little computational overhead into a O(m 3-e/3) BMM algorithm. Since it is very improbable that practical fast matrix multiplication algorithms exist, we thus establish one of the first nontrivial limitations on practical CFG parsing. 1The "four Russians" algorithm (Arlazarov et al., 1970), the fastest BMM algorithm that does not simply use ordinary matrix multiplication, has worst-case running time O(mS/log m). Our technique, adapted from that used by Sat ta (1994) for tree-adjoining grammar (TAG) parsing, is to show that BMM can be efficiently reduced to CFG parsing. Satta's result does not apply to CFG parsing, since it explicitly relies on the properties of TAGs that allow them to generate non-context-free languages. 2 D e f i n i t i o n s A Boolean matrix is a matrix with entries from the set {0, 1}. A Boolean matrix multiplication algorithm takes as input two m x m Boolean matrices A and B and returns their Boolean product A x B , which is the m × m Boolean matrix C whose entries c~j are defined by
منابع مشابه
CFG Parsing and Boolean Matrix Multiplication
In this work the relation between Boolean Matrix Multiplication (BMM) and Context Free Grammar (CFG) parsing is shown. The first described approach, which is due to Valiant (1975), shows how CFG parsing can be reduced to Boolean Matrix Multiplication. Afterwards the reverse direction, i.e. how a CFG parser can be used to multiply two Boolean matrices, is presented, which is due to Lee (2002). T...
متن کاملFast Context-Free Parsing Requires Fast Boolean Matrix Multiplication
Valiant showed that Boolean matrix multiplication (BMM) can be used for CFG parsing. We prove a dual result: CFG parsers running in time O([Gl[w[ 3-e) on a grammar G and a string w can be used to multiply m x m Boolean matrices in time O(m3-e/3). In the process we also provide a formal definition of parsing motivated by an informal notion due to Lang. Our result establishes one of the first lim...
متن کاملApproximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!
In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n) for parsing where ω ≤ 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial ...
متن کاملClique-Based Lower Bounds for Parsing Tree-Adjoining Grammars
Tree-adjoining grammars are a generalization of context-free grammars that are well suited to model human languages and are thus popular in computational linguistics. In the tree-adjoining grammar recognition problem, given a grammar Γ and a string s of length n, the task is to decide whether s can be obtained from Γ. Rajasekaran and Yooseph’s parser (JCSS’98) solves this problem in time O(n), ...
متن کاملFast Stochastic Context-Free Parsing: A Stochastic Version of the Valiant Algorithm
In this work, we present a fast stochastic context-free parsing algorithm that is based on a stochastic version of the Valiant algorithm. First, the problem of computing the string probability is reduced to a transitive closure problem. Then, the closure problem is reduced to a matrix multiplication problem of matrices of a special type. Afterwards, some fast algorithm can be used to solve the ...
متن کامل